Stochastic Gradient Descent for Nonconvex Learning Without Bounded Gradient Assumptions
نویسندگان
چکیده
منابع مشابه
Learning to Learn without Gradient Descent by Gradient Descent
We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-paramete...
متن کاملOn Nonconvex Decentralized Gradient Descent
Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been proposed for convex consensus optimization. However, on consensus optimization with nonconvex objective functions, our understanding to the behavior of these algorithms is limited. When we lose convexity, we cannot hope for obtaining globally optimal solutions (though we st...
متن کاملOnline Learning, Stability, and Stochastic Gradient Descent
In batch learning, stability together with existence and uniqueness of the solution corresponds to well-posedness of Empirical Risk Minimization (ERM) methods; recently, it was proved that CVloo stability is necessary and sufficient for generalization and consistency of ERM ([9]). In this note, we introduce CVon stability, which plays a similar role in online learning. We show that stochastic g...
متن کاملLearning Rate Adaptation in Stochastic Gradient Descent
The efficient supervised training of artificial neural networks is commonly viewed as the minimization of an error function that depends on the weights of the network. This perspective gives some advantage to the development of effective training algorithms, because the problem of minimizing a function is well known in the field of numerical analysis. Typically, deterministic minimization metho...
متن کاملParallelized Stochastic Gradient Descent
With the increase in available data parallel machine learning has become an in-creasingly pressing problem. In this paper we present the first parallel stochasticgradient descent algorithm including a detailed analysis and experimental evi-dence. Unlike prior work on parallel optimization algorithms [5, 7] our variantcomes with parallel acceleration guarantees and it poses n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Neural Networks and Learning Systems
سال: 2020
ISSN: 2162-237X,2162-2388
DOI: 10.1109/tnnls.2019.2952219